Overview

Dataset statistics

Number of variables14
Number of observations18249
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 MiB
Average record size in memory112.0 B

Variable types

NUM10
CAT4

Reproduction

Analysis started2020-05-12 08:24:02.247708
Analysis finished2020-05-12 08:24:30.456472
Duration28.21 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Date has a high cardinality: 169 distinct values High cardinality
region has a high cardinality: 54 distinct values High cardinality
4046 is highly correlated with Total Volume and 3 other fieldsHigh correlation
Total Volume is highly correlated with 4046 and 3 other fieldsHigh correlation
4225 is highly correlated with Total Volume and 3 other fieldsHigh correlation
Total Bags is highly correlated with Total Volume and 4 other fieldsHigh correlation
Small Bags is highly correlated with Total Volume and 4 other fieldsHigh correlation
Large Bags is highly correlated with Total Bags and 1 other fieldsHigh correlation
Date is uniformly distributed Uniform
region is uniformly distributed Uniform
Unnamed: 0 has 432 (2.4%) zeros Zeros
4046 has 242 (1.3%) zeros Zeros
4770 has 5497 (30.1%) zeros Zeros
Large Bags has 2370 (13.0%) zeros Zeros
XLarge Bags has 12048 (66.0%) zeros Zeros

Variables

Unnamed: 0
Real number (ℝ≥0)

ZEROS

Distinct count53
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.232231903117977
Minimum0
Maximum52
Zeros432
Zeros (%)2.4%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile2
Q110
median24
Q338
95-th percentile49
Maximum52
Range52
Interquartile range (IQR)28

Descriptive statistics

Standard deviation15.48104475
Coefficient of variation (CV)0.6388616953
Kurtosis-1.254364272
Mean24.2322319
Median Absolute Deviation (MAD)14
Skewness0.1083337271
Sum442214
Variance239.6627467
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
74322.4%
 
114322.4%
 
14322.4%
 
24322.4%
 
34322.4%
 
44322.4%
 
54322.4%
 
64322.4%
 
84322.4%
 
94322.4%
 
Other values (43)1392976.3%
 
ValueCountFrequency (%) 
04322.4%
 
14322.4%
 
24322.4%
 
34322.4%
 
44322.4%
 
ValueCountFrequency (%) 
521070.6%
 
513221.8%
 
503241.8%
 
493241.8%
 
483241.8%
 

Date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count169
Unique (%)0.9%
Missing0
Missing (%)0.0%
Memory size142.6 KiB
2017-11-12
 
108
2015-01-25
 
108
2016-02-28
 
108
2016-06-05
 
108
2017-03-19
 
108
Other values (164)
17709
ValueCountFrequency (%) 
2017-11-121080.6%
 
2015-01-251080.6%
 
2016-02-281080.6%
 
2016-06-051080.6%
 
2017-03-191080.6%
 
2017-03-051080.6%
 
2015-04-051080.6%
 
2017-10-221080.6%
 
2017-06-111080.6%
 
2017-12-101080.6%
 
Other values (159)1716994.1%
 

Length

Max length10
Median length10
Mean length10
Min length10

AveragePrice
Real number (ℝ≥0)

Distinct count259
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.405978409775878
Minimum0.44
Maximum3.25
Zeros0
Zeros (%)0.0%
Memory size142.6 KiB

Quantile statistics

Minimum0.44
5-th percentile0.83
Q11.1
median1.37
Q31.66
95-th percentile2.11
Maximum3.25
Range2.81
Interquartile range (IQR)0.56

Descriptive statistics

Standard deviation0.4026765555
Coefficient of variation (CV)0.2864030861
Kurtosis0.3251958507
Mean1.40597841
Median Absolute Deviation (MAD)0.28
Skewness0.5803027379
Sum25657.7
Variance0.1621484083
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.152021.1%
 
1.181991.1%
 
1.081941.1%
 
1.261931.1%
 
1.131921.1%
 
0.981891.0%
 
1.191881.0%
 
1.361871.0%
 
1.591861.0%
 
0.991851.0%
 
Other values (249)1633489.5%
 
ValueCountFrequency (%) 
0.441< 0.1%
 
0.461< 0.1%
 
0.481< 0.1%
 
0.492< 0.1%
 
0.515< 0.1%
 
ValueCountFrequency (%) 
3.251< 0.1%
 
3.171< 0.1%
 
3.121< 0.1%
 
3.051< 0.1%
 
3.041< 0.1%
 

Total Volume
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count18237
Unique (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean850644.0130089321
Minimum84.56
Maximum62505646.52
Zeros0
Zeros (%)0.0%
Memory size142.6 KiB

Quantile statistics

Minimum84.56
5-th percentile2371.862
Q110838.58
median107376.76
Q3432962.29
95-th percentile3716315.41
Maximum62505646.52
Range62505561.96
Interquartile range (IQR)422123.71

Descriptive statistics

Standard deviation3453545.355
Coefficient of variation (CV)4.059918488
Kurtosis92.10445778
Mean850644.013
Median Absolute Deviation (MAD)102962.47
Skewness9.007687479
Sum1.552340259e+10
Variance1.192697552e+13
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3713.492< 0.1%
 
3529.442< 0.1%
 
2038.992< 0.1%
 
569349.052< 0.1%
 
4103.972< 0.1%
 
9465.992< 0.1%
 
46602.162< 0.1%
 
2858.312< 0.1%
 
7223.462< 0.1%
 
19634.242< 0.1%
 
Other values (18227)1822999.9%
 
ValueCountFrequency (%) 
84.561< 0.1%
 
379.821< 0.1%
 
385.551< 0.1%
 
419.981< 0.1%
 
472.821< 0.1%
 
ValueCountFrequency (%) 
62505646.521< 0.1%
 
61034457.11< 0.1%
 
52288697.891< 0.1%
 
47293921.61< 0.1%
 
46324529.71< 0.1%
 

4046
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count17702
Unique (%)97.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean293008.4245306592
Minimum0.0
Maximum22743616.17
Zeros242
Zeros (%)1.3%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile19.6
Q1854.07
median8645.3
Q3111020.2
95-th percentile1263359.678
Maximum22743616.17
Range22743616.17
Interquartile range (IQR)110166.13

Descriptive statistics

Standard deviation1264989.082
Coefficient of variation (CV)4.317244747
Kurtosis86.80911256
Mean293008.4245
Median Absolute Deviation (MAD)8616.69
Skewness8.648219757
Sum5347110739
Variance1.600197377e+12
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02421.3%
 
3100.1%
 
1.248< 0.1%
 
18< 0.1%
 
48< 0.1%
 
1.257< 0.1%
 
67< 0.1%
 
1.216< 0.1%
 
2.545< 0.1%
 
1.275< 0.1%
 
Other values (17692)1794398.3%
 
ValueCountFrequency (%) 
02421.3%
 
18< 0.1%
 
1.131< 0.1%
 
1.193< 0.1%
 
1.21< 0.1%
 
ValueCountFrequency (%) 
22743616.171< 0.1%
 
21620180.91< 0.1%
 
18933038.041< 0.1%
 
17787611.931< 0.1%
 
17076650.821< 0.1%
 

4225
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count18103
Unique (%)99.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean295154.56835607433
Minimum0.0
Maximum20470572.61
Zeros61
Zeros (%)0.3%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile103.614
Q13008.78
median29061.02
Q3150206.86
95-th percentile1303657.658
Maximum20470572.61
Range20470572.61
Interquartile range (IQR)147198.08

Descriptive statistics

Standard deviation1204120.401
Coefficient of variation (CV)4.079626508
Kurtosis91.94902197
Mean295154.5684
Median Absolute Deviation (MAD)28521.3
Skewness8.942465608
Sum5386275718
Variance1.44990594e+12
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0610.3%
 
215.363< 0.1%
 
177.873< 0.1%
 
1.33< 0.1%
 
94.743< 0.1%
 
1.263< 0.1%
 
3478.972< 0.1%
 
61.012< 0.1%
 
65.222< 0.1%
 
5.732< 0.1%
 
Other values (18093)1816599.5%
 
ValueCountFrequency (%) 
0610.3%
 
1.263< 0.1%
 
1.282< 0.1%
 
1.33< 0.1%
 
1.311< 0.1%
 
ValueCountFrequency (%) 
20470572.611< 0.1%
 
20445501.031< 0.1%
 
20328161.551< 0.1%
 
18956479.741< 0.1%
 
17896391.61< 0.1%
 

4770
Real number (ℝ≥0)

ZEROS

Distinct count12071
Unique (%)66.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22839.73599265713
Minimum0.0
Maximum2546439.11
Zeros5497
Zeros (%)30.1%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median184.99
Q36243.42
95-th percentile106156.574
Maximum2546439.11
Range2546439.11
Interquartile range (IQR)6243.42

Descriptive statistics

Standard deviation107464.0684
Coefficient of variation (CV)4.705136192
Kurtosis132.5634409
Mean22839.73599
Median Absolute Deviation (MAD)184.99
Skewness10.15939563
Sum416802342.1
Variance1.1548526e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0549730.1%
 
2.667< 0.1%
 
3.327< 0.1%
 
1.646< 0.1%
 
10.976< 0.1%
 
1.66< 0.1%
 
1.596< 0.1%
 
2.745< 0.1%
 
1.655< 0.1%
 
1.635< 0.1%
 
Other values (12061)1269969.6%
 
ValueCountFrequency (%) 
0549730.1%
 
0.831< 0.1%
 
13< 0.1%
 
1.011< 0.1%
 
1.091< 0.1%
 
ValueCountFrequency (%) 
2546439.111< 0.1%
 
1993645.361< 0.1%
 
1896149.51< 0.1%
 
1880231.381< 0.1%
 
1811090.711< 0.1%
 

Total Bags
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count18097
Unique (%)99.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean239639.20205983886
Minimum0.0
Maximum19373134.37
Zeros15
Zeros (%)0.1%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile628.89
Q15088.64
median39743.83
Q3110783.37
95-th percentile1005478.892
Maximum19373134.37
Range19373134.37
Interquartile range (IQR)105694.73

Descriptive statistics

Standard deviation986242.3992
Coefficient of variation (CV)4.115530309
Kurtosis112.2721565
Mean239639.2021
Median Absolute Deviation (MAD)37299.96
Skewness9.75607167
Sum4373175798
Variance9.7267407e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0150.1%
 
3005< 0.1%
 
9905< 0.1%
 
916.674< 0.1%
 
266.674< 0.1%
 
5504< 0.1%
 
856.673< 0.1%
 
153.333< 0.1%
 
196.673< 0.1%
 
803.333< 0.1%
 
Other values (18087)1820099.7%
 
ValueCountFrequency (%) 
0150.1%
 
3.091< 0.1%
 
3.111< 0.1%
 
3.191< 0.1%
 
3.331< 0.1%
 
ValueCountFrequency (%) 
19373134.371< 0.1%
 
16394524.111< 0.1%
 
16298296.291< 0.1%
 
15972492.071< 0.1%
 
15804696.311< 0.1%
 

Small Bags
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count17321
Unique (%)94.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean182194.68669570936
Minimum0.0
Maximum13384586.8
Zeros159
Zeros (%)0.9%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile256.67
Q12849.42
median26362.82
Q383337.67
95-th percentile768147.228
Maximum13384586.8
Range13384586.8
Interquartile range (IQR)80488.25

Descriptive statistics

Standard deviation746178.515
Coefficient of variation (CV)4.095500964
Kurtosis107.0128851
Mean182194.6867
Median Absolute Deviation (MAD)25599.49
Skewness9.540659982
Sum3324870838
Variance5.567823762e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01590.9%
 
203.33110.1%
 
533.33100.1%
 
223.33100.1%
 
103.338< 0.1%
 
326.678< 0.1%
 
3008< 0.1%
 
196.678< 0.1%
 
263.338< 0.1%
 
123.338< 0.1%
 
Other values (17311)1801198.7%
 
ValueCountFrequency (%) 
01590.9%
 
2.521< 0.1%
 
2.571< 0.1%
 
2.731< 0.1%
 
2.791< 0.1%
 
ValueCountFrequency (%) 
13384586.81< 0.1%
 
12567155.581< 0.1%
 
12540327.191< 0.1%
 
11712807.191< 0.1%
 
11392828.891< 0.1%
 

Large Bags
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count15082
Unique (%)82.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54338.08814455587
Minimum0.0
Maximum5719096.61
Zeros2370
Zeros (%)13.0%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1127.47
median2647.71
Q322029.25
95-th percentile195699.768
Maximum5719096.61
Range5719096.61
Interquartile range (IQR)21901.78

Descriptive statistics

Standard deviation243965.9645
Coefficient of variation (CV)4.489778218
Kurtosis117.999481
Mean54338.08814
Median Absolute Deviation (MAD)2647.71
Skewness9.796454599
Sum991615770.6
Variance5.951939186e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0237013.0%
 
3.331871.0%
 
6.67780.4%
 
10470.3%
 
4.44380.2%
 
13.33280.2%
 
16.67180.1%
 
6.66180.1%
 
26.67180.1%
 
20140.1%
 
Other values (15072)1543384.6%
 
ValueCountFrequency (%) 
0237013.0%
 
0.971< 0.1%
 
1.31< 0.1%
 
1.331< 0.1%
 
1.382< 0.1%
 
ValueCountFrequency (%) 
5719096.611< 0.1%
 
4324231.191< 0.1%
 
4081397.721< 0.1%
 
4023485.041< 0.1%
 
3988101.741< 0.1%
 

XLarge Bags
Real number (ℝ≥0)

ZEROS

Distinct count5588
Unique (%)30.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3106.426507205874
Minimum0.0
Maximum551693.65
Zeros12048
Zeros (%)66.0%
Memory size142.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3132.5
95-th percentile12058.452
Maximum551693.65
Range551693.65
Interquartile range (IQR)132.5

Descriptive statistics

Standard deviation17692.89465
Coefficient of variation (CV)5.695578058
Kurtosis233.6026119
Mean3106.426507
Median Absolute Deviation (MAD)0
Skewness13.13975069
Sum56689177.33
Variance313038521.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01204866.0%
 
3.33290.2%
 
6.67160.1%
 
1.11150.1%
 
5120.1%
 
109< 0.1%
 
16.678< 0.1%
 
2.227< 0.1%
 
1506< 0.1%
 
806< 0.1%
 
Other values (5578)609333.4%
 
ValueCountFrequency (%) 
01204866.0%
 
11< 0.1%
 
1.11150.1%
 
1.261< 0.1%
 
1.31< 0.1%
 
ValueCountFrequency (%) 
551693.651< 0.1%
 
454343.651< 0.1%
 
390478.731< 0.1%
 
387400.221< 0.1%
 
377661.061< 0.1%
 

type
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.6 KiB
conventional
9126
organic
9123
ValueCountFrequency (%) 
conventional912650.0%
 
organic912350.0%
 

Length

Max length12
Median length12
Mean length9.500410981
Min length7

year
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size142.6 KiB
2017
5722
2016
5616
2015
5615
2018
1296
ValueCountFrequency (%) 
2017572231.4%
 
2016561630.8%
 
2015561530.8%
 
201812967.1%
 

Length

Max length4
Median length4
Mean length4
Min length4

region
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count54
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size142.6 KiB
Albany
 
338
NewOrleansMobile
 
338
California
 
338
Atlanta
 
338
DallasFtWorth
 
338
Other values (49)
16559
ValueCountFrequency (%) 
Albany3381.9%
 
NewOrleansMobile3381.9%
 
California3381.9%
 
Atlanta3381.9%
 
DallasFtWorth3381.9%
 
HartfordSpringfield3381.9%
 
Tampa3381.9%
 
Pittsburgh3381.9%
 
Boston3381.9%
 
Seattle3381.9%
 
Other values (44)1486981.5%
 

Length

Max length19
Median length9
Mean length10.29535865
Min length4

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Unnamed: 0DateAveragePriceTotal Volume404642254770Total BagsSmall BagsLarge BagsXLarge Bagstypeyearregion
002015-12-271.3364236.621036.7454454.8548.168696.878603.6293.250.0conventional2015Albany
112015-12-201.3554876.98674.2844638.8158.339505.569408.0797.490.0conventional2015Albany
222015-12-130.93118220.22794.70109149.67130.508145.358042.21103.140.0conventional2015Albany
332015-12-061.0878992.151132.0071976.4172.585811.165677.40133.760.0conventional2015Albany
442015-11-291.2851039.60941.4843838.3975.786183.955986.26197.690.0conventional2015Albany
552015-11-221.2655979.781184.2748067.9943.616683.916556.47127.440.0conventional2015Albany
662015-11-150.9983453.761368.9273672.7293.268318.868196.81122.050.0conventional2015Albany
772015-11-080.98109428.33703.75101815.3680.006829.226266.85562.370.0conventional2015Albany
882015-11-011.0299811.421022.1587315.5785.3411388.3611104.53283.830.0conventional2015Albany
992015-10-251.0774338.76842.4064757.44113.008625.928061.47564.450.0conventional2015Albany

Last rows

Unnamed: 0DateAveragePriceTotal Volume404642254770Total BagsSmall BagsLarge BagsXLarge Bagstypeyearregion
1823922018-03-111.5622128.422162.673194.258.9316762.5716510.32252.250.0organic2018WestTexNewMexico
1824032018-03-041.5417393.301832.241905.570.0013655.4913401.93253.560.0organic2018WestTexNewMexico
1824142018-02-251.5718421.241974.262482.650.0013964.3313698.27266.060.0organic2018WestTexNewMexico
1824252018-02-181.5617597.121892.051928.360.0013776.7113553.53223.180.0organic2018WestTexNewMexico
1824362018-02-111.5715986.171924.281368.320.0012693.5712437.35256.220.0organic2018WestTexNewMexico
1824472018-02-041.6317074.832046.961529.200.0013498.6713066.82431.850.0organic2018WestTexNewMexico
1824582018-01-281.7113888.041191.703431.500.009264.848940.04324.800.0organic2018WestTexNewMexico
1824692018-01-211.8713766.761191.922452.79727.949394.119351.8042.310.0organic2018WestTexNewMexico
18247102018-01-141.9316205.221527.632981.04727.0110969.5410919.5450.000.0organic2018WestTexNewMexico
18248112018-01-071.6217489.582894.772356.13224.5312014.1511988.1426.010.0organic2018WestTexNewMexico